Improving Actor-Critic Reinforcement Learning via Hamiltonian Monte Carlo Method

نویسندگان

چکیده

The actor-critic RL is widely used in various robotic control tasks. By viewing the from perspective of variational inference (VI), policy network trained to obtain approximate posterior actions given optimality criteria. However, practice, may yield suboptimal estimates due amortization gap and insufficient exploration. In this work, inspired by previous use Hamiltonian Monte Carlo (HMC) VI, we propose integrate with HMC, which termed as Policy. As such evolve base according our proposed method has many benefits. First, HMC can improve distribution better hence reduce gap. Second, also guide exploration more regions action spaces higher Q values, enhancing efficiency. Further, instead directly applying into RL, a new leapfrog operator simulate dynamics. Finally, safe problems, find that not only achieved return, but safety constraint violations discarding potentially unsafe actions. With comprehensive empirical experiments on continuous baselines, including MuJoCo PyBullet Roboschool, show approach data-efficient easy-to-implement improvement over methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Control with Actor-Critic Reinforcement Learning

4 Actor-Critic Marble Control 4 4.1 R-code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 The critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.3 Unstable actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.4 Trading off stability against...

متن کامل

1 Supervised Actor - Critic Reinforcement Learning

Editor’s Summary: Chapter ?? introduced policy gradients as a way to improve on stochastic search of the policy space when learning. This chapter presents supervised actor-critic reinforcement learning as another method for improving the effectiveness of learning. With this approach, a supervisor adds structure to a learning problem and supervised learning makes that structure part of an actor-...

متن کامل

Pretraining Deep Actor-Critic Reinforcement Learning Algorithms With Expert Demonstrations

Pretraining with expert demonstrations have been found useful in speeding up the training process of deep reinforcement learning algorithms since less online simulation data is required. Some people use supervised learning to speed up the process of feature learning, others pretrain the policies by imitating expert demonstrations. However, these methods are unstable and not suitable for actor-c...

متن کامل

Intensive versus Non-intensive Actor-Critic Reinforcement Learning Algorithms

Algorithms of reinforcement learning usually employ consecutive agent’s actions to construct gradients estimators to adjust agent’s policy. The policy is a result of some kind of stochastic approximation. Because of the slowness of stochastic approximation, such algorithms are usually much too slow to be employed, e.g. in real-time adaptive control. In this paper we analyze the replacing of the...

متن کامل

Actor-Critic Reinforcement Learning with Energy-Based Policies

We consider reinforcement learning in Markov decision processes with high dimensional state and action spaces. We parametrize policies using energy-based models (particularly restricted Boltzmann machines), and train them using policy gradient learning. Our approach builds upon Sallans and Hinton (2004), who parameterized value functions using energy-based models, trained using a non-linear var...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE transactions on artificial intelligence

سال: 2022

ISSN: ['2691-4581']

DOI: https://doi.org/10.1109/tai.2022.3215614